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Abstract 

Kosaraju in "Computation of squares in a string" briefly described a linear-time algorithm for com- 
puting the minimal squares starting at each position in a word. Using the same construction of suffix 
trees, we generalize his result and describe in detail how to compute in 0(k\ w |)-time the minimal fcth 
power, with period of length larger than s, starting at each position in a word w for arbitrary exponent 
k > 2 and integer s > 0. We provide the complete proof of correctness of the algorithm, which is somehow 
not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect 
certain types of pseudo-patterns in words, which is our original intention to study the generalization. 

1 Introduction 

A word of the form ww is called a square, which is the simplest type of repetition. The study on repetitions 
in words has been started at least as early as Thue's work [5T] in the early 1900's. Since then, there are many 
work in the literature on finding repetitions (periodicities), which is an important topic in combinatorics 
on words. In the early 1980's, Slisenko [l9j described a linear-time algorithm for finding all syntactically 
distinct maximal repetitions in a word. Crochemore [5], Main and Lorentz |15j described a linear-time 
algorithm for testing whether a word contains a square and thus testing whether a word contains any 
repetition. Since a word w of length n may have fl(n 2 ) square factors (for example, let w = 0"), usually 
only primitively-rooted or maximal repetitions are computed. Crochemore [4] described an 0(n log n)-time 
algorithm for finding all maximal primitively-rooted integer repetitions, where maximal means that a fcth 
power cannot be extend by either direction to obtain a (fc-l- l)th power. The 0(n log n)-time is optimal since 
a word w of length n may have fl(n\ogn) primitively-rooted repetitions (for example, let w be a Fibonacci 
word). Apostolico and Preparata [T] described an 0(n log n)-time algorithm for finding all right-maximal 
repetitions, which means a repetition x k cannot be extend to the right to obtain a repetition y l — x k z 
such that | y \ < \x\. Main and Lorentz [14j described an 0(nlogn)-time algorithm for finding all maximal 
repetitions. Gusfield and Stoye (2Ql [10] also described several algorithms on finding repetitions. We know 
that both the number of distinct squares |8] and the number of maximal repetitions (also called runs) |12j in 
a words are in 0(n). This fact suggests the existence of linear-time algorithms on repetitions that are distinct 
(respectively, maximal). Main [TO] described a linear-time algorithm for finding all leftmost occurrences of 
distinct maximal repetitions. Kolpakov and Kucherov [12] described a linear-time algorithm for finding all 
occurrences of maximal repetitions. For a most-recently survey on the topic of repetitions in words, see the 
paper [6]. 

Instead of considering repetitions from a global point of view, there are works on a local point of view, 
which means repetitions at each positions in a word. Kosaraju in a five-pages extended abstract [13] briefly 
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described a linear-time algorithm for finding the minimal square starting at each position of a given word. 
His algorithm is based on an alternation of Weiner's linear-time algorithm for suffix-tree construction. In 
the same flavor, Duval, Kolpakov, Kucherov, Lecroq, and Lefebvre [7] described a linear-time algorithm 
for finding the local periods (squares) centered at each position of a given word. There may be fi(logn) 
primarily-rooted maximal repetitions starting at the same position (for example, consider the left-most 
position in Fibonacci words). So, neither of the two results can be obtained with the same efficiency by 
directly applying linear-time algorithms on finding maximal-repetitions. 

In this paper, we generalize Kosaraju's algorithm [13] for computing minimal squares. Instead of squares, 
we discuss arbitrary fcth powers and show Kosaraju's algorithm with proper modification can in fact compute 
minimal fcth powers. Using the same construction of suffix trees, for arbitrary integers fc > 2 and s > 0, 
we describe in details a 0(k\ w |)-time algorithm for finding the minimal fcth power, with period of length 
larger than s, starting at each position of a given word w. 11 The absence of a complete proof prevents the 
comprehension of the algorithm (Kosaraju's algorithm) in full details . ..." [7] In this paper, we provide 
a complete proof of correctness of the modified algorithm. At the end, we show how this 0(k\ w |)-time 
algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is the 
original intention why we study this algorithm. 



2 Preliminary 

Let w = aid2 ■ • • a n be a word. The length \ w | of w is n. A factor w[p .. q] of w is the word a p a p +i ■ ■ ■ a q if 
1 < P < Q < n ] otherwise w[p .. q] is the empty word e. In particular, w[l .. q] and w[p .. n] are called prefix 
and suffix, respectively. The reverse of w is the word w R — a n ■ ■ -e^ai. Word w is called a kth power for 
integer fc > 2 if w = x k for some non-empty word x, where k is called exponent and x is called period. The 
2nd power and the 3rd power are called square and cube, respectively. 

The minimal (local) period mp k (w) larger than s of word w with respect to exponent k is the smallest 
integer m > s such that w[l .. km — 1] is a fcth power, if there is such one, or otherwise +oo. For example, 
mpo(0100101001) = 3 and mpKoiOOlOlOOl) = 5. The following results follow naturally by the definition 
of minimal period. 

Lemma 1. Let fc > 2 and s > be two integers and u be a word. If rnp k (u) ^ +oo, then for any word v, 

mp k (uv) — mp k (u). 

Proof. Suppose mp k (uv) < mp k (u). We can write uv = x k y for some words x, y with | x | = mp k (uv) > s. 
Then | x | = k ■ mp k (uv) < k ■ mp k (u) < \ u \ and thus x k is also a prefix of u. So mp k {u) <\x\— mp k {uv), 
which contradicts to our hypothesis. So mp k (uv) > mp k (u). On the other hand, if any word x k is a prefix 
of u, the word x k is also a prefix of uv. So mp k (uv) < mp k (u). Therefore, mp k (uv) = mp k (u). □ 

Lemma 2. Let fc > 2 and s > be two integers and u be a word. For any word v, 

k \mp k (uv), if\u\>k-mp k {uv); 

m Ps W = < , . 

I +oo, otherwise. 

Proof. Suppose mp k (u) ^ +oo. By Lemma [I] it follows that mp k (uv) — mp k (u) and | u > fc • mp k {u) = 
fc • mp k {uv). So, by contraposition, mp k {u) = +oo when u| < fc • mp k (uv). On the other hand, when 
\u \ > fc • mp k (uv), we can write uv — x k w for some words x, w such that | x\ = mp k (uv). Then x k is also a 
prefix of u and thus mp k (u) ^ +oo. So, by Lemma[lJ mp k (u) — mp k (uv). □ 

The right minimal period array of word w with respect to exponent fc and period larger than s is defined 
by k rmp w [i] — mp k (w [i . . n] ) for 1 < i < n and the left minimal period array of word w with respect to 
exponent fc and period larger than s is defined by k lmp w [i] — mp k (w[l .. i] R ) for 1 < i < n. For example, 

prmpoiooioiooi = [3, +oo, 1,2, 2, +oo, +oo, 1, +oo, +oo], and 
oZmpoiooioiooi = [+oo, +oo, +oo, 1, +oo, 3, 2, 2, 1, 5]. 
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+oo, root 




+00 2, leafi +00, leafi +00, ieo/7 




2>, leaf 1 +00, leaf® 

Figure 1: Suffix tree for 0100101001 with i7ipq(t(v)) on each node v 

A suffix tree T w for a word w = w[l ..n] is a rooted tree with each edge labeled by a non-empty word 
that satisfies 

1. each internal node, other than the root, has at least two children, 

2. each label on edge from the same node begins with a different letter, and 

3. there are exactly n leaves leafi and r(leafi) — w[i .. n] ■ $ for 1 < i < n, 

where character $ is a special letter not in the alphabet of w and function r is defined at each node v as 
the concatenation of the labels on edges along the path from the root to the node v. By definition, a suffix 
tree for a word w is unique up to renaming nodes and reordering among children. A suffix tree for the word 
0100101001 is illustrated in Figure [T] For more details on suffix tree, see the book [9J Chap. 5-9]. 

We denote by p(v), or more specifically by Pt w (v), the father of node v in the tree T w . Node x is called 
an ancestor of node y if either x is the father of y or x is an ancestor of y's father. When node x is an 
ancestor of node y, node y is called a descendent of node x. If node x is a common ancestor of nodes y 
and z in T Wl by the definition of suffix tree, then t(x) is a common prefix of r(j/) and t(z). We denote by 
I v I the node- depth of node v in T w , which is the number of edges along the path from the root to the node 
v. The node-depth of the root is and, for any node v, the node-depth | v \ is less than or equal to | t(v) |, 
which is called the depth of node v in T w and is denoted by S(v). We denote by lca(u, v) the lowest common 
ancestor of nodes u and v in a tree, which is the common ancestor of u and v with the largest node-depth. 
After a linear-time preprocessing, the lowest common ancestor of any pair of nodes in a tree can be found 
in constant time [Til HI] ■ 

Lemma 3. Let T w be the suffix tree of word w. If leafi and leafj are two leaves such that i > j , then the 
label on the edge from p(leafi) to leafi is not longer than the label on the edge from p(leafj) to leafj. 

Proof. Let n = \ w \ and words e^, ej be the labels on the edges from p(leafi) to leafi a- n d from p(leafj) to 
leafj, respectively. We now prove | < | ej \. Since i > j, by definitions, we can write r(leafj) = xrileafi) 
for some word x and thus 

T{p(feafj))e.j = T(leaf 3 ) = xr(leafi) = XT(p(leafi))ei. 
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If \ e j \ > S (leafi), then \a | < S(leafi) < \ ej |. Otherwise, we can write r(leafi) — yej for some word y 
and thus r(p(leafj)) — xy. Let leafk be another leaf that is a descendent of p(leafj). Then we can write 
r(/ea/fc) = r(p(leaf j))z = xyz for some word z such that z and are different at the first letter. The word 
yz is a suffix of w and the longest common prefix of the two words t (leafi) = yej and yz is y. So there is 
an ancestor v of leafi such that t(u) = y and thus S(p(leafi)) > \ y |. But r(p(leafi))ei = r(leafi) = yej. 
Therefore, | ej | < | ej |. □ 

A suffix tree for a given word w can be constructed in linear time [IZl H2] • Both Kosaraju's algorithm 
[13] for computing \rmp w and our modification on his algorithm for computing k rmp w and k lmp w for 
arbitrary k > 2 and s > are based on Weiner's linear-time algorithm 23J for constructing the suffix tree 
T w . So we briefly describe Weiner's algorithm here. 

Weiner's algorithm extends the suffix tree by considering the suffix w[n .. n], . . . , w[2 .. n], w[l .. n] and 
adding leaf n , . . . , Zea/2, leafi into the suffix tree incrementally. After each extension by w[i .. n], the new 
tree is precisely the suffix tree T w \i .. n \- The algorithm is outlined in Algorithm [1] By using indicator vectors 
and inter-node links, the total time to locate each proper position y at lines 9-10 can be in 0(n). Since how 
to locate the y is not quite relevant to the algorithm we will present later, we omit the details here. 

Input: a word w — w[l .. n]. 
Output: the suffix tree T w . 
1 begin function make_suf f ix_tree 
construct T n = T w [ n .. n ] ; 
for i from n — 1 to 1 do 
// assert: T t = 7 w \i.. n \ 
Ti < — extend (Tj_|_i ; w[i..n]) ; 
end 

return T\ ; 

7 end 

8 begin function extend (tree, word[i .. n\) 
II we assume tree = T word [ i+1 .. n] 

find the proper position y in tree to insert the new node leafi ', 
if needed, split an edge x — > z to two x — > y, y — > z by adding a new node y ; 
create and label the edge y — > leafi by word[i + \ r(y) \ .. n] ■ $ ; 
12 end 

Algorithm 1: Framework of Weiner's algorithm for constructing suffix tree 
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Once a node v is created, although the node-depth | v \ may change in later extensions by splitting on an 
edge in the path from the root to node v, the depth S(v) will never change in later extensions in a suffix tree. 
So we assume the depth S(v) is also stored on the node v in the suffix tree and can be accessed in constant 
time. The update of 5(v) only happens when v is created and can be computed by S(v) — 5(p(v)) + \u\, 
where u is the label on the edge from p(v) to v. So computing and storing the information 6 will not increase 
the computational complexity of the construction of a suffix tree. 



3 The algorithm for computing k s rmp w and k s lmp w 

First we show that how the minimal period mp k s {w) can be obtained from the suffix tree T w in linear time 
0(\ w |/ min{s, mpj(w)}). In particular, if s = f2(| w |) and w satisfies mp^w) — f2(| w |), then the algorithm 
compute mp k (w) in constant time, which is one of the essential idea in the computing of k rmp w and k lmp w . 

Lemma 4. Let k > 2 and s > be two integers and T w be the suffix tree of a word w. Then mp k (w) can be 
computed in O (| w \/ min{s, mp^w)}) time. 
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Proof. Let n = \ w\. There is an O (n/ min{s, mpQ(u>)})-time algorithm to compute mp^(w). First along 
the path from the leafy to the root, we find the highest ancestor h of leafi such that 6(h) > (k — l)(s + 1). 
Since 6(root) = 0, node h always has a father and 5(p(h)) < (k — l)(s + 1). Then we find the least common 
ancestor of leafi with any other leaf leafi that is a descendent of h and check whether the equation 

5(\c&(leaf x ,leafi)) > (k - l)(i - 1) (1) 

holds. If no leafi satisfies ([T]), then mp^(w) — +oo; otherwise, mp k (w) =i — l, where i is the smallest i that 
satisfies ([!]). The algorithm is presented in Algorithm^ 



Input: a suffix tree tree = T w [\.. n } an d two integers s > 0, k > 2. 
Output: the minimal period mp^(w). 
begin function compute_mp(tree, s, k) 

if k(s + 1) > n then return +oo else h < — leafi 
while 6(p(h)) >(k- l)(s + 1) do h < — p(h) ; 
mp < hoo ; 

// linear-time preprocessing for constant -time finding lea 

preprocessing the tree rooted at h for lea ; 
foreach leaf leafi being a descendent of h other than leafi do 
if 6(\c&(leafi,leafi)) > (k — l)(i — 1) then 

// assert: w[l..i — 1] is a period of the word w 
if mp > i — 1 then mp < — i — 1 ; 
end 
end 

return mp ; 
12 end 

Algorithm 2: Algorithm for computing mp^(w) by using the suffix tree T w 



Now we prove the correctness of this algorithm. First we observe that w = x k y for some non-empty 
word x, if and only if the common prefix of w[l .. n] and w[\ x \ + 1 .. n] is of length at least (k — 1)| x |, 
which means the leaf leaf\ x \ + i satisfies (fTJ. Furthermore, \x\ > s, if and only if leaf\ x \ + i satisfies 
<5(lca(7ea/i, leaf\ x |+i)) > (k — l)(s + 1), which means that leaf\ x \ + i is a descendent of h. (Since h has 
two descendents, h is not a leaf and thus h leaf x \+i-) So each time line 8 is executed, if and only if there 
is a corresponding prefix of w that is a kth. power with period of length i — 1 > s. The minimal length of 
such period, if any is returned and the correctness is ensured. 

Now we discuss the computational complexity of this algorithm. Let Th be the sub-tree rooted at h and 
I be the number of leaves in Th- By the definition of suffix tree, each internal node has at least two children 
in Th and thus the number of internal nodes in Th is less than I. Furthermore, the node-depth of any leaf in 
Th is also less than I. So the computational time of the algorithm is linear in I. (For details on constant-time 
algorithm finding lowest common ancestor with linear-time preprocessor, see 118].) In order to show 
the computation is in O (n/ min{s, mpg(w)})-time, it remains to see I = O (nj min{s, mp\(w)\) . We prove 
/ < n/min{s + l,mpg(u>)} by contradiction. Suppose I > nj min{s + 1,topq(w)}. Since there are I leaves 
ii,ia, • • • i%x with the same ancestor h, there are I factors of length t = (k — l)(s + 1) such that 

w[ii ,.ii+t—l] = w[%2 ■■ h + 1 - 1] = • • • = w[ii ..ii + t- 1]. 

Since 1 < ij < n for 1 < j < I, by the pigeon hole principle, there are two indices, say i± and 12, such that 
< i2 — ii < n/l < min{s + l, mp§(w)}. Then the common prefix of w[ii ..n] andw[«2--^] is of length at least 
t = (k — \)(s + 1) > (k — l)(i2 — ii), which means there is a prefix of w[ii + 1 — 1] = w[i2 ■■it + 1 — 1] = 
w[l..t— 1] that is a fcth power with period of length 12 — i\. Then mp\(w) < i% — i\ < mp^w), a 
contradiction. So the number of leaves in Th is < n/min{s + l,mpg(u>)} and thus the algorithm is in 
O (n/ min{s,mpQ(w)})-time. □ 
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For a word w = w[l .. n], by definitions, the left minimal period array and the right minimal period array 
satisfy the equation 

k lmp w [i] = k rmp w n[n + 1 — i], for 1 < i < n. 

So the left minimal period array of w can be obtained by computing the right minimal period array of w R . 
Hence in what follows we only discuss the algorithm for computing the right minimal period array of w; the 
algorithm for computing the left minimal period array of w follows immediately. 

A suffix tree with minimal periods k T w for a word w is a suffix tree T w with a function ir k , which is defined 
at each node v such that 7rJ(u) = mp k (r(v)). By definitions, once k T w is created for a word w — w[l .. n], 
the k rmp w can be obtained by reading the value w k at each leaf in order as follows: 

k s rmp w [l..n] = [■K k (leaf 1 ),Tr k (leaf 2 ), ■ . . ,ir k (leaf n )]. 

The suffix tree with minimal periods satisfies the following property. 

Lemma 5. Let k > 2 and s > be two integers and w be a word. For any node v in the suffix tree with 
minimal periods k T w such that ir k (p(v)) = +oo, then either ir k (v) = +oo or ir k (v) is between 

< n k ( V ) < 
k k — 1 

Proof. Let v be a node in k T w such that ir k (p(v)) — +oo. Since r(p(v)) is a prefix of t(v) and ir k (p(v)) = +oo, 
by Lemma [2] it follows that 

6{p(v)) = | r(p(«)) I < k ■ mp k (T{v)) = k ■ Ti k (v). 

Suppose TT k (v) ^ +oo. The common prefix of t(v)[1 .. 5(v)] and T(v)[ir k (v) + 1 .. S(v)] is of length at least 
(k — l)ir k (v). Then (A; — l)ir k (v) < S(p(v)), since p(v) is the lowest ancestor of v in k T w . Therefore, either 
n k {v) = +oo or 6{p(v))/k < ir k (v) < S(p(v))/(k - 1). □ 

In what follows, we will show how to construct the k T w for a word w with fixed k in linear time by a 
modified version of Kosaraju's algorithm 13). Kosaraju's algorithm constructs only qT w but our modification 
can construct k T w for arbitrary s > and k > 2. Both algorithms are based on the alternation of Weiner's 
algorithm [23 for constructing suffix tree T w . Our modified algorithm for computing k T w is illustrated in 
Algorithm [3l where the added statements for updating iv k are underlined. In addition to the suffix tree 
T{ = k T w [i.. n ] , auxiliary suffix tree A = T w w, ? i for some proper indices p, q is used. 

The main idea is that we use the classic Weiner's algorithm to construct the underlying suffix tree 
%v[i..n] step by step. At each step, at most two nodes are created and we update the it values on those new 
nodes. One possible new node y is between two nodes x, z when a split on the edge from x to z happens. 
Since ft k {z) is already computed, we update Tf k (y) directly. The other new node is the new leaf leafi. 
When n k (p(leafi)) ^ +oo, we update n k (leafi) directly. Otherwise, we compute n k (leafi) by constructing 
auxiliary suffix trees. The naive way is to construct T w u ,. n i and then to compute ir k (leafi) — mp k (w[i ..n]), 
both of which run in 0( \ w[i ..n] |) = 0(n) time. We instead construct a series of trees A — T ui u..j] for some 
j in such a way that mp k (w[i .. n]) = mp k {w[i .. j]). In addition, the total cost of constructing the trees A is 
in 0(n) and each cost of computing n k (leafi) — mp k (w[i ..j]) in each A is in O(k). 

Theorem 6. Let k > 2 and s > be two integers. Function compute_rmp in Algorithm^ correctly computes 
the right minimal period array k rmp w for the word w. 

Proof. Since each element k rmp w [i] is assigned by the value ir k (leafi) on the leaves of suffix tree Ti with 
minimal periods, the correctness of the algorithm relies on the claim T, = k T w [i.. n ]- The algorithm is based 
on Weiner's algorithm and the only change is to update the ir k values. So the underlying suffix tree of Ti 
correctly presents the suffix tree T w [i.. n ]- The update to ftg(v) only happens when the node v is created in 
some T w \i.. n ]- By definitions, n k (v) = mp k (r(v)) in any expanded suffix tree k T w [j .. n \ f° r j < i is equal to 
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Input: a word w = w[l .. n] and two integers s > 0, k > 2. 
Output: the right minimal period array k s rmp w . 
begin function compute_rmp(w, s, k) 

construct T n by constructing 7^,[ n .. n ] with n(root), n(leaf n ) 
A < — empty, j < — n, and d < — ; 



+ 00 



for i from n — 1 to 1 do 

find the proper position y in Tj+i to insert the new node leafi ; 
if needed then 

split an edge x — > z to two x — > y, y — > z by adding a new node y ; 

if <5(j/) > A:7r(z) then 7r(y) < — 7r(z) else 7r(y) < hoo ; 

end 

create and label the edge y — > Zea/i by iy[i + | r(y) | .. n] • $ ; 
// assert: suffix tree part of 7j is =T w \i.. n \ 
if j - i + 1 > 2kd/(k - 1) or <%) < d/2 then A < — empty ; 



// assert : A — empty or (A - 
if 7r(y) ^ +oo then 
Tr(leafi) < — ?r(y) ; 
if A = empty then continue ; 
else ^4 < — extend(^4 ; w[z .. j]) ; 
else 
if A-- 
d 



T w[l .. 3] and d/2 < 8{p{leafi)) < 2d) 



empty then 
— 5(y) and j 



i + (fc + — 1) 



^4 < — make_suf f ix_tree(w[i .. j]) ; 



else 
I 

end 

w(leafi) 



-extend (yl, w\i ..j]) ; 

< — compute_mp(^4 ; max{s, 6(y)/k}, k) ; 



end 

// assert: Vw in T % : ir(v) = mp k (T(v)) and thus T % = k T w ^, 
rmp[i] < — n(leafi) ; 
end 

rmp[n] < hoo and return rmp ; 



28 end 



Algorithm 3: Algorithm for computing k rmp v 
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ir k (v) in the suffix tree g%u[i..n] m which v is created. So in order to prove — k 1~ w [i . it remains to see 
that the assignment of 7Tg(w) for v is correct when node v is created. 

At the beginning, T w ^ n __„] is a tree of two nodes, the root and one leaf leaf n . We have ir k (root) = 
mp k (r(root)) = mp k (e) = +00 and ir k (leaf n ) — mp k (r(leaf n )) — mp k (w[n..n]) — +00. So the assignments 
on line 2 of Algorithm [3] is valid and T n = k t T w \ n .. n ]- 

Suppose it is true that Tj+i = gT w \i+\..ri\ for some i, 1 < i < n — 1, at the beginning of the execution 
of lines 5-25. Then on the next execution within the loop at lines 5-25, there are at most two nodes being 
created. One possible new node is y, the father of leafi, and the other is the leafi. 

For ir(y) on line 8: if some split happens on an edge from x to z by adding a new node y and two new 
edges from x to y, from y to z, respectively, then we have t(z) = r(y)u for some u ^ e. By Lemma [21 
mp k (r(y)) = mp k (r(z)), if | r(y) \ > k ■ mp k (r(z)); otherwise mp k (r(y)) = +00. So the assignments on line 8 
of Algorithm [3] is valid. 

For Tr(leafi) on line 23: consider the value ir k on the new leaf leafi. Since y = p(leafi), we have r(leafi) = 
r(y)v for some v ^ e. If mp k (r(y)) ^ +00, by LemmaHJ it follows that mp k (r(leafi)) = mp k (r(y)) and thus 
the assignment in line 13 of Algorithm [3] is valid. If mp k (r(y)) = +00, then mp k (r(leafi)) — mp k (w[i .. n]) 
is computed with the assistant of the auxiliary suffix tree A = T w u..j] by the function compute_mp in 
Algorithm [2] Since y — p(leafi), by Lemma [5) mp k (T(leafi)) > S(y)/k and thus the arguments in calling 
compute_mp is valid. To show the assignment on line 23 of Algorithm [3] is valid, the only thing remains to 
prove is that mp k (w[i ..n]) = mp k (w[i ■■j])- 

First we claim that S(pT i (leafi)) < $(PT i+1 (leafi+i)) + 1, where the subscript of p specifies in which tree 
the parent is discussed. If pTi (leafi+i) ^ pT i+1 (leafi+\), then there is a split on the edge from pr i+1 (leafi + \) 
to iea/i+i and leaves leafi, leafi + \ has the same father in T^. So leaves leafi+i, leafi + 2 has the same father 
in T i+ i and thus S(p Ti (leaf l )) = S(p T ,(leaf t+1 )) = 5(p Tz+1 (leaf l+1 )) + 1. If p Ti (leaf l+ i) = p Ti+1 (leaf t+1 ), 
then by Lemma[31 it follows that S(pTi(leafi)) < 5(pTi(leafi + i)) + 1 = 5(pr i+1 (leafi + i)) + 1. 

Then we claim 5(y) < j — i + 1 — 2d/(k — 1) holds right before line 23, where y — p(leafi). Consider 
the last created suffix tree A, then A =/= empty. If A is newly created, then S(p(leafi)) = d and i = 
j + 1 — (k + l)d/ (k — 1). So S(p(leafi)) = j — i + 1 — 2d/ (k — 1). Now we assume A extends from a previous 
one. In the procedure of extending A, both j and d remain the same, exponent k is a constant, the index i 
increase by 1, and the depth S(pTi (leafi)) increases at most by 1. So 5(pTi (leafi)) < j — (i+l) + l — 2d/(k — 1) 
still holds. 

Now we prove mp k (w[i ..n]) — mp k (w[i .. j]). If mp k (w[i .. n]) = +00, by Lemma [2] it follows that 
mp k (w[i . . j]) = +00 = mp k (w[i .. n]). Now we assume mp k (w[i .. n]) ^ +00. By LcmmaEl h follows that 
mp k (w[i . . n]) — mp k (T (leafi)) < ^(v)/(k — 1). In addition, j — i + 1 < 2kd/(k — 1) always holds when 
A 5^ empty. So the following holds 

k ( 2d \ 1 2kd 

k ■ mp k (w[i ..n}) < -^-j - i + 1 - — -j =(j-i + l) + — -(j -i + 1) - _ < I w[i ..j] I, 

and thus by Lemma [2] again mp k (w[i ..j)) = mp k (w[i .. n]). This finishes the proof Ti = k T w [i..n]- D 

Theorem 7. Let k > 2 and s > be two integers. The time complexity of computing the right minimal 
period array k rmp w for input word w in Algorithm^ is 0(k\ w |) . 

Proof. Let n = \w\. Each assignment to elements in array rmp at lines 25,27 of Algorithm [3] can be done 
in constant time. So the total time of computing rmp — k rmp w from the suffix tree T\ = k T w with minimal 
periods is in 0(n). 

The lines 2,5,7,10 of Algorithm [3] constitute exactly the Weiner's algorithm for constructing the suffix 
tree T Wl which is in 0(n)-time. 

Most of the underlined statements, except lines 15,19,21,23, in Algorithm [3] can be done in constant time 
in a unit-cost model, where we assume the arithmetic operations, comparison and assignment of integers 
with 0(log n)-bit can be done in constant time. The number of executions of lines 5-25 is n — 1 and thus 
the total cost of those underlined statements is in 0(n). 
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Now we consider the computation of line 23. By Lemma 0] since A — T w u„ j], the cost of each calling to 
compute_mp in Algorithm [2] is in time linear in 

\w[i..j]\ < j-i + 1 



min{max{s, S(y)/k}, mpg(iu[i .. j])} min{i5(y)/fc, mpQ(w[i .-j])} 

We already showed in the proof of Theorem[S]that mp\ (w[i .. j]) — mp\ (w[i .. n] ). By Lemma[Sl mp\ (w [i . . n] ) > 
S(y)/k. In addition, j — i + 1 < 2kd/ (fc — 1) and S(y) > d/2 always hold when A ^ empty. So we have 



+ 1 2kd/(k - 1) 4fc 



2 



mm{5(y)/k,mpQ(w[i .. j])} min{d/2fc, d/2k} k — 1 

The number of executions of lines 5-25 is n — 1 and thus the total cost on line 23 is 0(kn). 

Now we consider the computation of lines 15,19,21. Those statements construct a series of suffix trees 
A = % v \i..j] by calling to make_suf f ix_tree and extend in Algorithm [TJ Each suffix tree is initialized 
at line 19, extended at lines 15,21, and destroyed at line 11. Suppose there are in total I such trees, and 
suppose, for 1 < m < I, they are initialized by A — %u[i m ..j m ] with d m — 5(pT im {leafi m )) and destroyed 
when A = T w [i' m ..j m ] such that j m — (i' m — 1) + 1 > 2kd m /(k — 1) or <5(j>t;, _ 1 {leafi> m -i)) < d m /2 . In 
addition, when A ^ empty, the inequality j m — i + I < 2kd/ (fc — 1) always holds for i m < i < i' m . Since the 
construction of suffix tree in Algorithm [T] is in linear time, the total cost on lines 15,19,21 is in time linear in 



- - - 2fc 

I W t*™ "im] I = Um - i'm + 1) < 2J ]~l dm - 

m— 1 m— 1 m=l 

First, we consider those trees A destroyed by the condition j m — (i' m — 1) + 1 > 2kd m /(k — 1). Then 
3m — i'm + 1 = 2fcd m /(fc — 1) and j m = i m + {k + l)e? m /(fc — 1) — 1 hold, and thus the decrease of i is 
i m — i' m = {jm — (k + l)d m / (fc — 1) + 1) — (j m + 1 — 2fcc? m /(fc — 1)) = d m . Hence the total cost in this case 
is 



-d m = > U m - i'J < ((n - 1) - 1) = 0(n). 



^ fc-l m_ fc-l^ vm m ^fc-l 

3m-{i' m -l)+l>^d m 

Second, we consider those trees A destroyed by the condition S(pT i , _ t (leafy m _\)) < d m /2. Then5(p T ., _ 1 {leaf i i m _i))- 

3(PT im (leafi m )) < —d m /2. In the proof of Theorem[6l we showed 5(pTi{le-afi)) — S(pT i+1 (leafi+i)) < 1- Since 
5(pT 1 {leafi)) > 0, it follows that the total cost in this case is 



2fc , 2fc s (PT im (leaf im )) - 5(p T ., _ 1 (Zea/ i / m _ 1 )) fc 



dm<l—- r>. <—{n-l) = 0(n). 



^ fc - 1 m fc - 1 ^ 2 "fc-1 

8(p(leaf z , m _ 1 ))<±d m 

The only remaining case is that the suffix tree A is not destroyed even after the construction of T±. This 
can be avoided by adding a special character £ not in the alphabet of w at the beginning of w. Then for 
i = l the father of the leafi is the root and thus A is destroyed by the condition 8(y) < d/2. In addition, 
mp k s (£ ■ w) = +oo and thus this modification do not change the computational complexity of this algorithm. 
So, the total cost on lines 15,19,21 is 0(n). 

Therefore, the total cost of the algorithm is 0(n) + 0(n) + 0(n) + 0(kn) + 0(n) and thus is in time 
O(kn). The algorithm is in linear time when exponent fc is fixed. □ 



4 Applications — detecting special pseudo-powers 

In this section, we discuss how the linear algorithm for computing k a rmp w and k s lmp w for fixed exponent fc 
can be applied to test whether a word w contains a particular type of repetition, called pseudo-powers. 

Let E be the alphabet. A function (f> : E* — ■> E* is called an involution if 0(</>(iu)) = w for all w G E* 
and called an antimorphism if <p(uv) = <j)(v)4>(u) for all u,v 6 E*. We call (j) an antimorphic involution if 
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A C G A C G A C 




G C A T G C G 



A C G A C G 



G C A T G C G C A 

Figure 2: An example of hair-pin structure made from a pseudo power ACGACGACGCGTACG with respect to 
the Watson-Crick complementarity 

4> is both an involution and an antimorphism. For example, the classic Watson-Crick complementarity in 
biology is an antimorphic involution over four letters {A,T, C,G} such that A i— > T, T i— > A, C i— > G, G i— > C. 
For integer fc and antimorphism (f>, we call word w a pseudo kth power (with respect to <f>) if w can be 
written as w = x\X% ■ ■ ■ Xk such that either xi = Xj or X4 = 4>{xj) for 1 < i,j < k. In particular, we call 
pseudo 2nd power by pseudo square and pseudo 3rd power by pseudo cube. For example, over the four letters 
{A, T, C, G}, word ACGCGT is a pseudo square and ACGTAC is a pseudo cube with respect to the Watson-Crick 
complementarity. Pseudo fcth power is of particular interest in bio-computing since a single strand of DNA 
sequence of this form can itself make a hair-pin structure as illustrated in Figure [2] A variation on pseudo 
fcth power has also appeared in tiling problems (see [5]). 

Chiniforooshan, Kari and Xu [3] discussed the problem of testing whether a word w contains a pseudo 
fcth power as a factor. There is a linear-time algorithm and a quadratic-time algorithm for testing pseudo 
squares and pseudo cubes, respectively. But for general exponent fc, the known algorithm for testing pseudo 
fcth powers is in 0{\ w | 2 log | w |). 

We will show that the particular type of pseudo fcth powers, (j)(x)x k ~ 1 , x k ~ 1 (f>(x), and (xcf>(x))^ (or 
(x(f>(x))^^ x, if fc is odd) can be tested faster. First we need the following concept. The centralized maximal 
pseudo-palindrome array ^cmp w of word w with respect to an antimorphic involution <j> is defined by 

^cmp w [i] = max{m : < m < min{i, | w | — i}, <f)(w[i — m + 1 .. i]) = w[i + 1 ..i + m]} for < i < | w |. 

For example, cmp O iooioiooi = [0,0,0,3,0,0,0,0,2,0,0]. 

Lemma 8. Let <f> be an antimorphic involution. The centralized maximal pseudo-palindrome array ^cmp w 
of word w with respect to <fi can be computed in 0( \ w |) time. 

Proof. All maximal palindromes can be found in linear time (for example, see [9j pages 197-198]). In exactly 
the same manner, by constructing suffix tree T w £ c j ) ( w ), where £ is a special character not in the alphabet of 
w, the array ^cmp w can be computed in linear time. More precisely, the algorithm is outlined in Algorithm[U 

Now we prove the correctness of Algorithm 01 Let n — \ w\ and w = w£<f>{w). Then \ w\ = In + 1. By 
the definition of suffix tree T^j, word r(lca(/ea/j+i, leaf2n-i+2)) is the longest common prefix of rileafi+i) = 
w[i + 1 . . 2n + 1] and T{leaf2n~i+2) = w[2n — i + 2 .. 2n 4- 1]. Since the character £ does not appear in word 
T(leaf2n-i+2) and w[l .. i] = <fi(T(leaf2 n -i+2)), it follows that r(lca(Zea/i+i, leaf2n-i+2)) is the longest word 
u such that u is a prefix of w[i + 1 .. n] and 4>{u) is a suffix of w[l .. i]. (Here (j) is an antimorphism, so when 
apply <fr, suffix and prefix relations exchange each other.) This proves the correctness. 

Both the construction of suffix tree 7^ and the preprocessing for fast finding lea is in linear time. In 
addition, the computation of lea for any pair of leaves is constant after the proprocessing. So the total 
running time of Algorithm |4] is in 0( \ w |). □ 
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Input: a word w = w[l .. n] and an antimorphic involution </>. 
Output: the centralized maximal pseudo-palindrome array ^cmp w . 

1 begin function compute_cmp(w, <f>) 

2 T< — make_suff ix_tree (w£ (f>(w)) ; // £ is a character not in w 

3 linear-time preprocessing the tree T for constant-time finding lea ; 

4 for i from 1 to n — 1 do 

5 | cmp[i] < — 5(lca(leafi +1 ,leaf 2n -i+2)) ; 

6 end 

r cmp[0] < — and cmp[n] < — ; 

8 return cmp ; 

9 end 

Algorithm 4: Algorithm for computing ^cmp w 

Theorem 9. Let k > 2 and s > be integers and (f> be an antimorphic involution. Whether a word w 
contains any factor of the form x k ~ 1 <fi(x) (respectively, 4>{x)x k ~ 1 ) with \ x\ > s can be tested in 0(k\w\) 
time. 

Proof. The main idea is first to compute k ~llmp w (respectively, k ~\rmp w ) and ^cmp w , and then to compare 
the two arrays. There is a factor of the form x k ~ 1 <fi(x) (respectively, cf>(x)x k ~ 1 ) with | x \ > s if and only if 
there is an index i such that k ~].lmp w [i] < ^cmp\i\ (respectively, k ~\rmp w [i] < ^cmp[i — 1]). More details of 
detecting x k ~ 1 <f>(x) is given in Algorithm^ and the case of 4>{x)x k ~ 1 is similar. 

To see the correctness of Algorithm [5j we prove that word w contains any factor of the form x k ~ 1 (j)(x) 
with | a; | > s if and only if k ~ 1 s lmp w [i] < ^cmp[i) holds for some i, 1 < i < n, where n — \w\. Suppose the 
inequality m = k ~\lmp w [i] < ^cmp[i] holds for some i. Then w contains word w[i — (k — l)m + 1 ..i + m] of 
the form x k ~ 1 <f>(x) as a factor and \ x\ > s. Now suppose w contains a factor w[j .. j + kp — 1] of the form 
x k ~ 1 <fi(x) for p = | x | > s. Then by definitions, k ~llmp w [j + (fc — l)p— 1] < p and ^cmp[j + (k — l)p — 1] > p. 
So m — k ~llmp w [i] < ^cmp[i] holds for i = j + (k — l)p — 1. 

The computation of k ~]lmp w is 0(k\ w |)-time and the computation of ^cmp is 0{\ w |)-time. There are 
0{\ w |) comparisons of integers. So the total running time of Algorithm [5] is in 0(k\ w ). □ 

Input: a word w = w[l .. n], an antimorphic involution </>, and two integers s > 0, k > 0. 
Output: "NO" if w contains a factor of the form x k ~ 1 cj>(x) with \ x\ > s; "YES" otherwise. 

1 Imp < — compute_lmp(w, s, k — 1) ; // rmp < compute_rmp(w, s, k — 1) for cf>(x)x k ~ 1 

2 cmp < — compute_cmp(w, <j)) ; 

3 for i from 1 to n do 

4 | if lmp[i] < cmp[i] then return "NO" ; // rmp[i] < cmp[i — 1] for (j>(x)x k ~ l 

5 end 

e return "YES" ; 

Algorithm 5: Algorithm for testing whether w contains a factor of the form x k ~ 1 (f>(x) with \ x\ > s 

Theorem 10. Let k > 2 and s > be integers and (f> be an antimorphicc involution. Whether a word 

— i — i 

w contains any factor of the form (x<j)(x)) 2 (or (xcf>(x)) 2 x if k is odd) with x \ > s can be tested in 
0(| w \ /k) time. 

Proof. The main idea is first to compute ^cmp w and then to enumerate all possible indices and periods. 
There is a factor of the specified form as in the theorem if and only if there are k — 1 consecutive terms 
greater than s in ^cmp w with indices being arithmetic progression with difference greater than s. The 
algorithm is given in Algorithm [6l 
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To see the correctness of Algorithm^ we observe that w contains a factor of the form w[i .. j + kp — 1] = 
X(f>(x)x(f>(x) ■ ■ ■ with p = | x | > s if and only if there are k consecutive terms ^cmp w [i + p — 1], ^cmp w [i + 
2p — 1], . . . , ^cmp w [i + (k — l)p — 1] that are > p > s. 

The computation of ^cmp w is 0(| u> |)-time and obviously the remaining part is 0( w | 2 /fc)-time. So the 
total running time of Algorithm [6] is in 0(| w \ /k). □ 



Input: a word w = w[l .. n], an antimorphic involution </>, and two integers s > 0, k > 0. 

— i — i 

Output: "NO" if w contains a factor of the form (x<f>(x)) 2 (or (x<p(x)) 2 x if k is odd) with \ x \ > s; 
"YES" otherwise. 

1 cmp < — compute_cmp(w, (f>) ; 

2 for d from s + 1 to ['V^J do 

3 
4 
5 
6 
7 
8 
9 
10 



consecutive + 1 



for i from to d — 1 do 

consecutive < — ; 
for j from 1 to [(n — i)/d\ — 1 do 
if cmp[i + jd] > d then consecutive 
else consecutive < — ; 
if consecutive > k — 1 then return 'WO" ; 
end 
end 
n end 

12 return "YES" ; 

Algorithm 6: Algorithm for testing whether w contains a factor of the form (xcf>(x)) 2 with 



x > s 



5 Conclusion 

We generalized Kosaraju's linear-time algorithm for computing minimal squares that start at each position 
in a word, which by our definition is denoted by the array \rmp w . We showed a modified version of his 
algorithm that can compute, for arbitrary integers k > 2, s > 0, the minimal fcth powers, with period larger 
than s, that starts at each position (to the left and to the right) in a word, which by our definition is denoted 
by the right minimal period array k s rmp w and the left minimal period array k s lmp w . The algorithm is in 
0[k\ w )-time. 

The algorithm is based on the frame of Weiner's suffix tree construction. Although there are other 
linear-time suffix tree construction algorithms, such as McCreight's algorithm and Ukkonen's algorithm, 
none of the two can be altered to compute minimal period arrays with the same efficiency, due to the special 
requirements that the suffices of the given word are added from the short to the long and Tr k (v) is only 
updated when v is created. 

We showed the 0{k \ w |)-time algorithm for computing minimal period arrays can be used to test whether 
a given word w contains any factor of the form x k (j>(x) (respectively, (p(x)x k ) with \x \ > s. We also discussed 

2 . — 

an 0(| w | /fc)-time algorithm for testing whether a given word w contains any factor of the form (x(f>(x)) 2 
I — i 

(or (x(f)(x)) 2 x if k is odd) with | x \ > s. All the word xx ■ • ■ X(f){x), (f>(x)x ■ ■ ■ xx, x<f)(x)x(f)(x) ■ ■ ■ are 
pseudo-powers. There are possibilities that some particular type of pseudo-powers other than the ones we 
discussed can also be detected faster than the known 0(| w | 2 log | w )-time algorithm. 
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